AITopics | long-tailed object detection

Collaborating Authors

long-tailed object detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Neural Information Processing SystemsApr-24-2026, 19:55:22 GMT

C.9 Further Analysis on Existing Post-Processing Calibration Methods We compare NORCAL to the existing post-calibration methods in the main paper (cf.

artificial intelligence, machine learning, norcal, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Neural Information Processing SystemsDec-27-2025, 05:52:56 GMT

Long-tailed object detection (LTOD) aims to handle the extreme data imbalance in real-world datasets, where many tail classes have scarce instances. One popular strategy is to explore extra data with image-level labels, yet it produces limited results due to (1) semantic ambiguity---an image-level label only captures a salient part of the image, ignoring the remaining rich semantics within the image; and (2) location sensitivity---the label highly depends on the locations and crops of the original image, which may change after data transformations like random cropping.To remedy this, we propose RichSem, a simple but effective method, which is robust to learn rich semantics from coarse locations without the need of accurate bounding boxes. RichSem leverages rich semantics from images, which are then served as additional ``soft supervision'' for training detectors. Specifically, we add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection. The semantic branch is only used for training and is removed during inference. RichSem achieves consistent improvements on both overall and rare-category of LVIS under different backbones and detectors. Our method achieves state-of-the-art performance without requiring complex training and testing procedures. Moreover, we show the effectiveness of our method on other long-tailed datasets with additional experiments.

long-tailed object detection, name change, rich semantic and coarse location, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.65)

Add feedback

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Neural Information Processing SystemsDec-23-2025, 19:23:19 GMT

Vanilla models for object detection and instance segmentation suffer from the heavy bias toward detecting frequent objects in the long-tailed setting. Existing methods address this issue mostly during training, e.g., by re-sampling or re-weighting. In this paper, we investigate a largely overlooked approach --- post-processing calibration of confidence scores. We propose NorCal, Normalized Calibration for long-tailed object detection and instance segmentation, a simple and straightforward recipe that reweighs the predicted scores of each class by its training sample size. We show that separately handling the background class and normalizing the scores over classes for each proposal are keys to achieving superior performance. On the LVIS dataset, NorCal can effectively improve nearly all the baseline models not only on rare classes but also on common and frequent classes. Finally, we conduct extensive analysis and ablation studies to offer insights into various modeling choices and mechanisms of our approach. Our code is publicly available at https://github.com/tydpan/NorCal.

long-tailed object detection, model calibration, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.67)

Add feedback

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Neural Information Processing SystemsJan-20-2025, 02:56:07 GMT

Long-tailed object detection (LTOD) aims to handle the extreme data imbalance in real-world datasets, where many tail classes have scarce instances. One popular strategy is to explore extra data with image-level labels, yet it produces limited results due to (1) semantic ambiguity---an image-level label only captures a salient part of the image, ignoring the remaining rich semantics within the image; and (2) location sensitivity---the label highly depends on the locations and crops of the original image, which may change after data transformations like random cropping.To remedy this, we propose RichSem, a simple but effective method, which is robust to learn rich semantics from coarse locations without the need of accurate bounding boxes. RichSem leverages rich semantics from images, which are then served as additional soft supervision'' for training detectors. Specifically, we add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection. The semantic branch is only used for training and is removed during inference. RichSem achieves consistent improvements on both overall and rare-category of LVIS under different backbones and detectors.

detector, long-tailed object detection, rich semantic and coarse location, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.88)

Add feedback

SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection

Tran, Phi Vu

arXiv.org Artificial IntelligenceDec-28-2024

Recent years have witnessed tremendous advances on modern visual recognition systems. Despite such progress, many vision models still struggle with the open problem of learning from few exemplars. This paper focuses on the task of object detection in the setting where object classes follow a natural long-tailed distribution. Existing approaches to long-tailed detection resort to external ImageNet labels to augment the low-shot training instances. However, such dependency on a large labeled database is impractical and has limited utility in realistic scenarios. We propose a more versatile approach to leverage optional unlabeled images, which are easy to collect without the burden of human annotations. Our SimLTD framework is straightforward and intuitive, and consists of three simple steps: (1) pre-training on abundant head classes; (2) transfer learning on scarce tail classes; and (3) fine-tuning on a sampled set of both head and tail classes. Our approach can be viewed as an improved head-to-tail model transfer paradigm without the added complexities of meta-learning or knowledge distillation, as was required in past research. By harnessing supplementary unlabeled images, without extra image labels, SimLTD establishes new record results on the challenging LVIS v1 benchmark across both supervised and semi-supervised settings.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.20047

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Neural Information Processing SystemsOct-9-2024, 14:26:27 GMT

long-tailed object detection, model calibration, segmentation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.90)

Add feedback

Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection

Yaman, Burhaneddin, Mahmud, Tanvir, Liu, Chun-Hao

arXiv.org Artificial IntelligenceMay-14-2023

We propose an embarrassingly simple method -- instance-aware repeat factor sampling (IRFS) to address the problem of imbalanced data in long-tailed object detection. Imbalanced datasets in real-world object detection often suffer from a large disparity in the number of instances for each class. To improve the generalization performance of object detection models on rare classes, various data sampling techniques have been proposed. Repeat factor sampling (RFS) has shown promise due to its simplicity and effectiveness. Despite its efficiency, RFS completely neglects the instance counts and solely relies on the image count during re-sampling process. However, instance count may immensely vary for different classes with similar image counts. Such variation highlights the importance of both image and instance for addressing the long-tail distributions. Thus, we propose IRFS which unifies instance and image counts for the re-sampling process to be aware of different perspectives of the imbalance in long-tailed datasets. Our method shows promising results on the challenging LVIS v1.0 benchmark dataset over various architectures and backbones, demonstrating their effectiveness in improving the performance of object detection models on rare classes with a relative $+50\%$ average precision (AP) improvement over counterpart RFS. IRFS can serve as a strong baseline and be easily incorporated into existing long-tailed frameworks.

instance-aware repeat factor sampling, long-tailed object detection

arXiv.org Artificial Intelligence

2305.08069

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback